Memory structure and method for shuffling a stack of data utilizing buffer memory locations

ABSTRACT

PCT No. PCT/AU89/00460 Sec. 371 Date May 21, 1991 Sec. 102(e) Date May 21, 1991 PCT Filed Oct. 20, 1989 PCT Pub. No. WO90/04849 PCT Pub. Date May 3, 1990.A memory structure which can operate as a stack or list, the structure comprising a plurality of contiguous memory locations sub-divided into contiguous sub-structures, each of the sub-structures having at least one buffer memory location associated with it, whereby stack or list shuffle operations can be performed in parallel on the sub-structures. The memory structure can be utilized in a content addressable memory and records can be maintained in sorted order by key in the memory structure. The content addressable memory can be implemented using currently available random access memory (RAM) structures and the content addressable memory can be implemented in very large scale integration (VLSI).

The present invention relates to a memory structure and, in particular,to a memory structure which is adapted to permit relatively fastshuffling of data stored in the structure through the structure therebycommercially facilitating operations such as sorting.

DISCUSSION OF PRIOR ART

A particular, although by no means exhaustive, use for the memorystructure outlined in this specification is in the field of contentaddressable memories (CAMs).

In the late 1970's it was realized that the majority of the work thatcomputers were being called upon to perform in the majority ofapplications was associative in nature, including sorting information,accessing information by key and the like. It was also realized that thestoring of information according to a memory address in a memory (e.g.random access memory (RAM) and the like) was not the most efficient wayof storing that information where associative type operations were to beperformed on that information. Ideally it was preferred that theinformation be stored according to specific search keys, and clusteredin accordance with an algorithm which related the search keys, in someway (for example alphabetical order, numerical order or the like).

Storing information in memory according to the content of theinformation being stored (ie. according to a key which is itself part ofthe stored information) rather than storing according to an addressbecame known as content addressable memory (CAM). Software treestructures were and still are a software implementation of a contentaddressable memory. In essentially all cases to date, the memory inwhich the elements of that tree structure reside is still conventionalrandom access memory with elements of the tree stored by address.Ideally a hardware content addressable memory structure should be muchfaster than a hardware RAM combined with a software tree structure.Various attempts have been made to date to make normally addressable RAMbehave as content addressable memory thereby combining theinexpensiveness and large memory capacity of commercially available RAM,with the desired CAM structure. U.S. Pat. No. 4,758,982 to PRICEdiscloses one such attempt and also provides a good summary of CAMissues. U.S. Pat. No. 4,758,983 to BERNDT discloses another attempt atmaking commercially available RAM behave as a CAM.

SUMMARY OF THE INVENTION

In at least one particular embodiment of the present invention,commercially available RAM is combined with surrounding hardware logicso as to provide a (relatively) very fast CAM structure.

In other embodiments of the invention a memory stack structure which canbe shuffled very rapidly, in an arbitrarily and user selectable smallnumber of CPU cycles is disclosed. This structure seeks to go at leastsome way in overcoming a commonly held belief in the industry thatmaintaining data in sorted list order is computationally highlyinefficient (even though desirable).

In one broad form there is provided a memory structure for storingrecords. The structure comprises a plurality of contiguous memorylocations each for storing one of the records, the plurality of memorylocations being functionally separated into memory sub-structures, eachof the memory sub-structures comprising a separate but contiguoussub-portion of the memory structure, each of the sub-structuresincluding a buffer memory location attached thereto, each of the buffermemory locations being arranged to receive a record stored in a memorylocation within the associated sub-structure and to transfer a recordstored in the buffer memory location to a memory location within anotherof the sub-structures, each of the buffer memory locations being furtherarranged to receive a record stored in a memory location in asub-structure which is immediately adjacent to the sub-structurecorresponding with the buffer memory and to transfer a record stored inthe buffer memory location to a memory location in a sub-structure whichis immediately adjacent the sub-structure located with the buffer memorylocation.

As used in this specification the term "contiguous" implies a structurewhich is ordered in a logical sense, but not necessarily a physicalsense. The term probably best implies a separate but logicalcontinuation (of memory structure) for the purposes of maintainingsegmented but ordered data.

Similarly where memory locations are referred to as being above or belowother memory locations, such descriptions are not to be taken literally,but rather should be read in a logical sense. In a particular embodimentof the invention, the arrangement of memory locations may be transposedsideways, which relates to transferring records across sub-structures,rather than up and down.

Preferably the memory structure is adapted to store the records insearch key order; each of the records including a search key comprisingat least a part of the record.

Preferably the memory structure performs as a stack or list, and arecord is added to the stack or list at a chosen memory location withinthe memory structure by either shuffling all records at and above thechosen memory location up one memory location (UP SHUFFLE OPERATION) orshuffling all records at and below the chosen memory location down onememory location (DOWN SHUFFLE OPERATION) in the memory structure or byshuffling all records sideways in raster format when the sub-structuresare transposed, and whereby a record is deleted from the stack or listby a logically opposite overwrite process.

In a further broad form there is provided a method of storing records insearch key order in a memory structure, the memory structure comprisinga plurality of contiguous memory locations wherein each memory locationof the plurality of locations is adapted to store one of the records,the plurality of memory locations being functionally separated intomemory sub-structures, each of the memory sub-structures comprising aseparate but contiguous sub-portion of the memory structure, eachsub-structure additionally including a buffer memory location attachedto it, each buffer memory location adapted to receive a record stored ina memory location within the sub-structure or to transfer a recordstored in the buffer memory location to a memory location within thesub-structure, the buffer memory location further adapted to receive arecord stored in a memory location in a sub-structure which isimmediately adjacent the sub-structure to which the buffer memory isattached or to transfer a record stored in the buffer memory location toa memory location in a sub-structure which is immediately adjacent thesub-structure to which the buffer memory is attached, the methodcomprising the steps of placing the records into contiguous memorylocations in the structure ordered by search key.

In yet a further broad form there is provided a content addressablememory structure for storing records, the structure comprising aplurality of contiguous memory locations wherein each memory location ofthe plurality of locations is adapted to store one of the records, theplurality of memory locations being functionally separated into memorysub-structures, each of the memory sub-structures comprising a separatebut contiguous sub-portion or the memory structure, each sub-structureadditionally including a buffer memory location attached to it, eachbuffer memory location adapted to receive a record stored in a memorylocation within the sub-structure or to transfer a record stored in thebuffer memory location to a memory location within the sub-structure,the buffer memory location being further adapted to receive a recordstored in a memory location in a sub-structure which is immediatelyadjacent the sub-structure to which the buffer memory is attached or totransfer a record stored in the buffer memory location to a memorylocation in a sub-structure which is immediately adjacent thesub-structure to which the buffer memory is attached, the recordsmaintained in the memory locations of the memory structure in sortedorder by key.

In yet a further broad form there is provided a method of operating amemory structure so as to behave as a content addressable memory, amemory structure for storing records, the structure comprising aplurality of contiguous memory locations wherein each memory location ofthe plurality of locations is adapted to store one of the records, theplurality of memory locations being functionally separated into memorysub-structures, each of the memory sub-structures comprising a separatebut contiguous sub-portion of the memory structure, each sub-structureadditionally including a buffer memory location attached to it, each thebuffer memory location adapted to receive a record stored in a memorylocation within the sub-structure or to transfer a record stored in thebuffer memory location to a memory location within the sub-structure,the buffer memory location being further adapted to receive a recordstored in a memory location in a sub-structure which is immediatelyadjacent the sub-structure to which the buffer memory is attached or totransfer a record stored in the buffer memory location to a memorylocation in a sub-structure which is immediately adjacent thesub-structure to which the buffer memory is attached, the methodcomprising maintaining the records in the memory locations of the memorystructure in sorted order by key.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described with reference to thedrawings wherein:

FIG. 1 shows a generalized embodiment of the memory structure of thepresent invention;

FIG. 2 shows a "FIND" operation using a binary search on a list (stack)of items;

FIG. 3 shows diagrammatically an "INSERT" operation on a list;

FIG. 4 shows a "DELETE" operation from a list;

FIG. 5 shows a block diagram of a first embodiment of the invention as aCAM structure;

FIG. 6 shows one embodiment of memory content movement by the backpush-pull method;

FIG. 7 shows an alternative embodiment of memory content movement by thefront push-pull method;

FIG. 8 shows a particular form of front push-pull termed segmentpush-pull;

FIG. 9 shows data items arranged for implementation of segment push-pullof FIG. 8;

FIG. 10 shows an alternative form of front push-pull known as splitpush-pull;

FIG. 11 shows data items arranged in storage for the split push-pullmethod of FIG. 10;

FIG. 12 shows a hardware implementation of a CAM embodiment of theinvention utilizing commercially available RAM chips; and

FIG. 13 shows the multiple address range (MAR) method of division ofchips or memory banks.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The following detailed description of the drawings relates to theembodiments of the invention implemented specifically as contentaddressable memory structure. However, the description of the preferredembodiments should not be taken as limiting the uses to which thebroadest form of the invention as claimed can be put in practice.

1. BASIC "SHUFFLE" STRUCTURE EMBODIMENTS

The broadest form of the invention relates to a particular memorystructure concept. That concept is illustrated diagrammatically inFIG. 1. It should be emphasized at the start that FIG. 1 is conceptualand does not necessarily bear any direct physical relation to real lifeimplementations of the memory structure of the invention. With computersand computer memory, it is not so much the actual physical location ofmemory locations relative to each other that is important, but ratherthe data path connections between the memory locations.

Referring to FIG. 1 the memory structure 1 of a first embodiment of theinvention is shown to comprise a plurality of memory locations A1, A2,A3, A4, B1, B2, B3, B4, C1, C2, C3, C4, and so on down to Z1, Z2, Z3,Z4. These memory locations are contiguous (ie. they are linked togetherin the order stated and for data in memory location A4, for example, toreach memory location A2, the data must be processed through memorylocation A3). The memory locations A1 through to 24 can therefore bethought of, notionally, as comprising a stack or list. In addition thememory locations are divided into sub-groupings of memory locationstermed sub-structures 2, 3, 4, 5 (the sub-structures for memorylocations D1 to Y4 are not shown but follow according to the conceptalready provided by FIG. 1 ). The sub-structures 2, 3, 4, 5 are orderedaccording to, and in, the same way as the memory locations so that theycan be sub-grouped. ie. Sub-structure 2 containing memory locations A1to A4 is "above" sub-structure 3 containing memory locations B1 throughto B4 and, similarly, sub-structure 4 is "below" sub-structure 3.Similarly, within sub-structure 2 memory location A1 is above memorylocation A2 while memory location 4 is "below" memory location A3.

In addition to the ordered memory structure described in FIG. 1 there isalso attached to each sub-structure 2, 3, 4, 5, a buffer memory location6, 7, 8, 9 respectively. These buffer memory locations are not intendedfor normal storage of information in memory, but rather, exist for thepurpose of holding what amounts to "overflow" data arising as a resultof shuffling of data up or down the memory locations in eachsub-structure 2, 3, 4, 5. The buffer memory locations allow memoryshuffle in all sub-structures 2, 3, 4, 5 together (ie. in parallel).

To take one example applied to FIG. 1, (later used in a contentaddressable memory embodiment termed "split push-pull") if one assumesthat it takes four clock cycles to shift the contents of memorylocations A1 through to A4 down by one location (ie. the contents of A4fall from [or are initially pushed from] the bottom of sub-structure 2into the buffer memory location 6, the contents of A3 move to A4, thecontents of A2 move to A3 and the contents of A1 move to A2) then inthose same four clock cycles the memory contents of sub-structures of 3,4 and 5 are also shifted downward by one memory location. To completethe downward shuffle movement, in the next few clock cycles the contentsof buffer memory location 6 are transferred to memory location B1 at thetop of sub-structure 2, the contents of buffer location 7 aretransferred to memory location C1 in sub-structure 4 and so one (inparallel) for all of the sub-structures comprising the memorystructure 1. Essentially, therefore, a downward memory shuffle iscomprised of two steps: a first step (possibly commencing with aninitial push of the lowermost memory location contents of eachsubstructure into the buffer memory location of that sub-structure)where the memory contents in each sub-structure are shifted downward(all sub-structures carrying out this function at the same time, inparallel) followed by the transfer of the overflow from the firstoperation (now residing in the buffer memory locations) beingtransferred (again in a parallel operation) to the appropriate memorylocation in the adjacent sub-structure.

An up shuffle can be carried out in the same way with the contents ofeach of the sub-structures being shifted upward in a parallel operationwith the overflow from the top of each sub-structure being stored in thebuffer memory location of the sub-structure located immediately abovefollowed by a transfer of the contents of the buffer memory locationinto the lowest memory location of the substructure to which the buffermemory location is attached.

In the example just described in relation to FIG. 1 it is assumed thatthe size (ie. data carrying capacity) of each of the memory locationsand of the buffer memory locations is the same. Also, this examplespecifically shows what amount to 26 sub-structures each containing fourmemory locations (and one buffer memory location of the same size as anyone of the individual memory locations.

In a further example applied to FIG. 1, the sub-structures 2, 3, 4, 5together with their associated buffer memory locations 6, 7, 8, 9 aretransposed so as best to be thought of as lying side by side as adjacentcolumns. (This example is later described applied to a contentaddressable memory embodiment termed "segment push-pull" as illustratedin FIGS. 7 & 8.) In this second example shuffling of memory contentstakes place in a slightly different way with greater use being made ofthe buffer memory locations 6, 7, 8, 9 during any given shuffleoperation. In this example the "ordering" of the memory contents is fromleft to right with the top "row" containing memory locations A1, B1, C1,. . . Z1 and the next "row" containing memory locations A2, B2, C2, . .. Z2 and so on. To shuffle the memory contents, the contents of locationA3 are moved into buffer memory location 6. At the same time, and inparallel, the contents of memory location B3 are moved into buffermemory location 7, the contents of memory location C3 are moved intobuffer memory location 3, and the contents of memory location Z3 aremoved into buffer memory location 9. As a second step or operation, thecontents of the buffer memory locations 6, 7, 8, 9 are transferred tothe immediately adjacent column (sub-structure) except for the contentsof buffer memory location 9 which "wraps around", and is placed in, theleft most sub-structure in the second row. In summary, buffer memorylocation 6 transfers to memory location B3, buffer memory location 7transfers to memory location C3, buffer memory location 8 transfers tomemory location D3, and, at the end column, buffer memory location 9transfers to memory location A4 in sub-structure 2. This results in agenerally left to right raster scan type shuffle. As with the firstexample the rate at which a complete shuffle is carried out isessentially determined by the depth of the sub-structures. However, thearrangement of this second example allows more efficient use of the"parallelism" of the sub-structures and will typically provide a fastershuffle for a given amount of data than the first example, particularlywhere not all of the sub-structures are filled with data.

Variations on this basic structure are possible and include (but are byno means necessarily limited to) the following:

The number of memory locations in each sub-structure as a proportion ofthe total number of memory locations in the memory structure isarbitrary and depends upon design constraints. As the number of memorylocations in each sub-structure increases as a proportion of the totalnumber of memory locations in the memory structure, the execution speedof the first step of a shuffle operation is reduced.

The buffer memory location can be varied in size or, indeed, instructure. For example the buffer memory location can comprise twomemory locations stacked one upon the other thereby allowing two memorylocations in a sub-structure to "overflow" into the buffer memorylocation.

The structure described in FIG. 1 is particularly useful for speeding upsorting operations when records are stored in the memory structure 1 ofFIG. 1 in order by key. As an essential part of any ordering operationit is necessary to make room in the memory structure at arbitrary memorylocations so as to insert new records or delete records therefrom.Generally speaking, the nature of the memory structure of FIG. 1 is suchthat the shuffle operation necessary to make room for the new recorddepends for its speed of execution only upon the number of memorylocations in each sub-structure, not on the total number of memorylocations of the whole memory structure. Accordingly very large memorystructures containing a large number of memory locations can be shuffledas quickly as a memory structure containing only the number of memorylocations to be found in any one of the sub-structures making up thewhole memory structure. This particular feature or attribute is utilizedin the following description of further preferred embodiments of theinvention.

2. CONTENT ADDRESSABLE MEMORY EMBODIMENTS

Unlike other hardware CAMs which combine search logic with memory cellsto form one piece of active memory, the embodiments of contentaddressable memory to be described hereunder decouple the logic frommemory, making the memory much easier to fabricate and allow much moreflexibility in the design of the logic circuits. The content addressablememory of the embodiments is hereinafter termed a push-pull contentaddressable memory (PPCAM).

One special characteristic of the PPCAM is its use of paralleltechniques in maintaining the data structure for fast searching, thusdramatically reducing the search hardware required.

PPCAM operates on data directly in memory rather than moving it throughthe memory hierarchy (eg. from main memory to cache, from cache toregister). This non-register based architecture is justifiable only dueto the recent advances in memory technology which enable memory speedsto approach speeds of central processing units (CPUs), thus reducing thepenalty in direct memory operations.

As software cost continues to increase and hardware cost decreases,hardware based solutions like the PPCAM become more attractive. Forexample, recent advances in very large scale integration (VLSI)technology allow the PPCAM to be implemented much more cheaply thanbefore.

2.1 PUSH-PULL CONTENT ADDRESSABLE MEMORY (PPCAM)

The PPCAM is based on simple sequential operations and at first glanceseems very inefficient. The use of parallel techniques enables the PPCAMto overcome this traditional problem.

Unlike most other hardware CAMs, the PPCAM is based on the sorted-list,and achieves its performance with a low logic per bit ratio by usingdedicated hardware, in addition to search hardware, to maintain the datain sorted order.

2.1.1. PPCAM ARCHITECTURE

While the PPCAM can support a number of high level operations the basicoperations are INSERT, FIND and DELETE. For the FIND operation, thePPCAM performs a search algorithm (for example a binary search) tolocate the required record (FIG. 1) or records.

To INSERT, the PPCAM looks for an address in which to insert the recordusing the FIND operation and then pushes every record following thataddress down, creating room to insert the new record (FIG. 3).Similarly, to DELETE, the PPCAM covers the record to be discarded withthe one below it and pulls all the following records up (FIG. 4).

In the FIND operation, the PPCAM only scans a small part of the data,unlike most other CAMs, which scan all data. This allows PPCAM to haveone of the fastest search times of all existing CAMs. Recent hardwareCAMs have a search speed of about 1000 Mbits per second, a software CAMon a 12 MHz IBM AT (Registered Trade Mark) type personal computer (PC)can do about 5 Mbits per second (see Computerworld Australia, 25 Aug.1989). A 16 bit word size PPCAM using the same type of RAM as the PC cando more than 100 Gbits per second. ie. 1,000 times faster than thecurrently available hardware CAMs and 20,000 times faster than thesoftware tree solution.

Ultra fast search speed such as in the above example comes with a price,namely, the data has to be in sorted order. The INSERT and DELETEoperations are used to keep the data in the PPCAM in sorted order. Thepushing and pulling of data items in the INSERT and DELETE operationsare typically inefficient. In the present embodiment parallel techniquesare used to speed them up in the PPCAM.

By its nature the PPCAM provides a method of facilitating sorting andthereby implementation of content addressable memory in a computer. Thedesign consists of a search engine (SE), an operation controller (OC),an input/output interface (IOI) and a push-pull memory (PPM) as shown inFIG. 5.

The four components of the PPCAM mentioned are functional rather thanphysical. Each can be implemented using software or hardware, dependingon specific applications and performance required. The four (conceptual)components are now described.

2.1.2 OPERATION CONTROLLER (OC)

The OC controls the other modules of the PPCAM and prevents internal buscontentions. By reading requests from the host and checking the state ofthe PPCAM, it activates different modules within the PPCAM to executethe required operations.

Due to the simplicity of the operations of the PPCAM, the OC can beimplemented with a few simple logic chips if basic INSERT, DELETE andFIND operations are all that is required, thereby avoiding the fetchingstoring, decoding and executing of instructions that occur in most otherco-processors.

However, the OC can also be implemented as software executing on a hostor a microprocessor dedicated to the PPCAM operations so as to providemore high level functions.

2.1.3 INPUT/OUTPUT INTERFACE (IOI)

The IOI is mainly used to perform conversion between the PPCAM word sizeand host word size. In situations where the PPCAM word size is the sameas the host memory word size, the PPCAM can simply be mapped directlyonto the host address space in which case the IOI will not usually berequired.

Another function of the IOI is in high performance systems where thereare separate data paths to the host and to the mass storage. In thiscase, the IOI has its own storage interface to reduce the load of thePPCAM on the host data bus. The host and the mass storage can thenaccess different parts of the PPM concurrently.

The PPCAM data structure is sorted and linear. Also it is directlyaccessible by the CPU. Seamless interface to existing computer systemsis possible through the use of memory mapping and function calls. Recentpopularity of procedural techniques in programming makes interface tothe PPCAM much easier. The PPCAM operations can directly replace searchand sort function calls.

Depending on its actual function, the IOI can vary from a few logicgates to a few lines of code to provide interface to the CPU or DMAcontrollers.

2.1.4 THE SEARCH ENGINE (SE)

The SE is used to perform the look-up operation. It has an addresscalculator controlled by a comparator. The comparator indicates whetherthe magnitude of the data under test is greater, smaller or equal to thetarget data (depending on the actual search algorithm, a 2-way insteadof 3-way comparator may be used).

The address calculator may use the binary search technique in generalsituations, but if the distribution of data is known then a differentsearch technique can be used to produce results faster (eg. dividing bythree instead of two).

For the binary search technique, the search engine can be built withsimple logic components like adders, shifters and comparators. Moresophisticated parallel-pipelined search processors can be Used if higherperformance is needed, for example the Fibonacci search (see T. Feng,"Parallel Processing", Proceedings of the Segamore Computer Conference,Aug. 20-23, Springer Verlag 1975).

A Masking Register can be incorporated into the SE. It is then possibleto search and sort on specific sub-fields within a word. By setting bitsin the Masking Register, the comparator can operate on different partsof the data as required. This is useful for example in situations wherean alternate search key is needed in the same set of data.

Depending on the performance required, the SE can be execute in softwareon the host or a dedicated microprocessor or custom built hardware. Inthe case of using a microprocessor, an external comparator is needed toavoid the overhead of moving PPM words into the microprocessor'sregisters for comparisons. An 8-bit processor can then be used to searcha 64-bit wide PPM efficiently, as long as the comparator is 64-bit wide.

2.1.5 THE PUSH-PULL MEMORY (PPM)

The aim of the push-pull memory is to maintain the memory data in sortedorder, via the push-pull technique, independent of the host. Eachpush-pull consists of a sequence of operations to shift data up and downthe memory area. A Push or Pull will be performed depending on whetherthere is an INSERT or DELETE operation.

There are two ways of accomplishing the push-pull operation. Thepush-pull operation can be performed directly on memory cells after theaddress decoding circuit (FIG. 6) (Back Push-Pull), or the push-pulloperation can be performed using the decoding circuit (FIG. 7) (FrontPush-Pull).

The Back, Push-Pull is more suitable for VLSI implementation usingcharged coupled device (CCD), shift registers etc, while the FrontPush-Pull matches the random access memory (RAM) chips that are readilyavailable, and could also be implemented in VLSI using standardmacro-cells. The present application concentrates on the Front Push-Pulltype PPM but the invention should not be construed as limited thereto.

The word size of the PPM is mainly dependent on the application. Fordedicated applications, the PPM will have the same word size as the dataitem (record) being manipulated. For more general applications the wordsize will normally be the same as the host word size.

In high-performance systems, the PPM word size will be a multiple of thehost word size. For example, if the host has a 32-bit word size, thenthe PPM will have a 64-bit or 128-bit word size. This allows much fasterpush-pull and searching. The increase in word size is not expensive,since unlike the word size of the CPU, the associated logic increase isvery small. Also, recent improvements in integrated circuit (IC)packaging technology (eg. reduced pin size in surface mount components),has made wide memory words more feasible as the number of pins requiredincreases.

2.1.5A PUSH-PULL CONTROL BITS

Functionality of the PPCAM could be improved by adding a few controlbits at the end of the PPM words. For example, the push-pull operationcould be interrupted orderly and ambiguous addressing would be possible.

Although the FIND operation is very fast it could be held up bypush-pulls initiated by past INSERT or DELETE operations. One way tosolve this problem is to allow interruption of the push-pull operation.The nature of the push-pulls allows FIND access to be performed whilethe push-pull is going on. This is because the list remains in orderduring the push-pulls.

During the push-pull operation, duplicated records are constantly beingcreated and destroyed. When the push-pull operation is interrupted, itis necessary to leave the data in a consistent manner. This can be doneby having a few control bits added to the end of each word. Thisconcept, called Push-Pull Control Bits, can be used to disable,identify, and re-order words temporary.

For example, consider the delete bit is at the end of each record, inwhich the bit is set to O normally, and set to 1 when a record isdeleted. The two duplicate records will be next to each other, and ifthe delete bit of the "lower" record is set (the one with largerphysical address) to 1, then its value will be larger than the record"above" it, and the FIND operation can then be performed properly. Afterthe FIND operation is finished, the push-pull operation can becontinued.

The Push-Pull Control Bits concept is also useful in other situations,like in resolving multiple favourable responses (ambiguous addressing),an extra bit could be added to the end of the records, when set to 1,indicate that they are not unique. This provides a powerful way ofhandling records with the same content.

2.1.5B PPM WORD SIZE

The word size of the PPM is mainly dependent on the application. Fordedicated applications, the PPM will have the same word size as the dataitem (record) being manipulated. For more general applications, the wordsize will normally be the same as the host word size.

In high-performance systems, the PPM word size will be a multiple of thehost word size. For example, if the host has a 32-bit word size, thenthe PPM will have a 64-bit or 128-bit word size. This allows much fasterpush-pull and searching. The increase in word size is not expensive,since unlike the word size of the CPU, the associated logic increase isvery small.

The PPCAM can handle data items of different sizes very easily, bothearly at the hardware design stage, or latter when in use.

The PPCAM's memory modules could be horizontally cascaded together withno increase in complexity. This allows the hardware designer to tailorthe PPM word size for specific applications.

At the usage stage, the data items can span or share PPM words. By usingthe Masking Bits a few data items could be manipulated within one PPMword. If the data items are too large, they can span across a few PPMwords. This situation can be handled by inserting an address off-setwhen calculating addresses during searching and process one word at atime.

5.1.6 PARALLEL PUSH-PULL

The main feature that differentiates PPCAM from other techniques is theuse of parallel hardware to manipulate data in conventional RAM. Thisnot only attains fast speed, but also low cost and high integration.

There are two ways of increasing the push-pull speed by breaking thememory up into separate banks. The aim is to move a few records at thesame time by providing additional buffer, and data paths.

With the parallel push-pull operation, the more banks being used, thefaster the push-pull operation. The speed up is linear. As long as thereare a sufficient number of memory banks, the total time it takes topush-pull any amount of data will be the same as the time it takes topush-pull the data in just one bank of memory. The push-pull time willbe constant and independent of the amount of data.

In the first way, termed the Segment Push-Pull, a whole segment ofmemory words is moved down in one operation (FIG. 8). While FIG. 8 showsa push of three words down, a push-pull distance of any number of wordsis possible. In order to facilitate such an operation, a number ofconsecutive sorted data items could be stored in separate banks (FIG.8).

Since the memory words are interleaved, for each push-pull, words mayhave to be transferred across the banks. The banks will have their ownbuffers to store the words in transit, going from one bank to another.These buffers are called Transfer Buffers.

In the second way, termed the Split Push-Pull, a group of evenly spacedwords is moved all at once (FIG. 10). Unlike the earlier method, thewords are transferred between banks ONLY at the end of the push-pulloperation. Here consecutive sorted data items are stored in the samebank until the bank is full (FIG. 11 ).

One major difference between the Split and Segment push-pulls is in themanner in which data items are moved and stored in the memory banks. TheSplit Push-Pull moves data physically across the memory banks while theSegment Push-Pull moves data physically up and down the banks. Data inSplit Push-Pull is stored in sorted order across banks while the sorteddata is stored down the banks in Split Push-Pull.

Both push-pulls are more efficient than other forms of parallelprocessing. Since the data is simply moved around there is nodegradation in performance due to multiple access and data integritycontrol. Also the regular structure of the PPM allows it to beimplemented in VLSI very easily.

2.1.7 LARGE PUSH-PULL

With the Parallel Push-Pull method there is a performance problem wheneach data item spans a few PPM words since each push-pull only moveseach of the PPM words by one position.

For every INSERT and DELETE, it takes as many push-pull operations asthe number of words in a data item to move the whole item. If an item isfive words in size, it will take five times as long to perform an INSERTor DELETE as a data item with a size of one word. This is because witheach push-pull only one word can be transferred across the memory banksand a five word size item requires five push-pull operations.

If the size of the Transfer Buffer is increased to at least the size ofthe data item, a whole data item can be push-pulled across the bankswith one push-pull operation. The INSERT and DELETE time will then beindependent of the size of the data item (as long as the data item issmaller than the size of one memory bank).

This concept is named the Large Push-Pull. Data of different sizes canbe moved with one push-pull, as long as the total size of the data to bemoved is not greater than the size of a bank. The large Push-Pull iseffectively increasing the push-pull distance of the push-pulloperation, since each push-pull is not restricted to moving data by oneword only.

While the Large Push-Pull can push-pull more data, depending on how itis implemented, in most cases it is not possible to push-pull furtherthan the next bank. The Large Push-Pull does increase the amount ofmemory required slightly. This is acceptable in many situations asmemory is cheap and the improvement in speed of push-pull for large dataitems is from 0(N²) to O(N) and thereby quite significant.

2.1.8 JUSTIFICATION FOR PUSH-PULL MEMORY

While the Parallel push-pull schemes allow linear improvement inpush-pull speed, it still seems inefficient to move so many recordsevery time there is an INSERT or DELETE operation. One way to increasethe speed is to decrease the number of words in each bank. In thelimiting case there is only one word in each bank, thus all the wordscan be moved in just one memory access.

Such high push-pull speed is NOT necessary in most cases. In almostevery application the host has some other processes to perform beforeand after accessing the CAM. An example of such a process is looking upa file from the disk after accessing the index, or sending someinformation on to the network after modifying a station status. Thus ifthe incoming data is buffered the push-pull operation can be overlappedwith other host operations. This is another form of parallelism whichallows the PPCAM to perform some useful work while the host is doingsomething else. The off-line data structure organization time is used asa leverage against the on-line search time.

The PPCAM's INSERT and DELETE commands can be executed in two phases:the search phase and the push-pull phase. The first phase (searchphase), using the FIND operation, is very fast. It finds out where tostart performing the push-pull and makes sure that the CPU is notINSERTing the same data or DELETING non-existent data. Control is thenreturned to the CPU and the second phase (push-pull phase) occursconcurrently as the CPU continues its execution. Thus to the CPU theINSERT and DELETE speeds look as quick as the FIND speed.

With the parallel push-pull techniques, the worst case scenario (theinsertions and deletions are always at the beginning of the PPM banks)will be the same as the average case (random access patterns). The totalpush-pull time will always be the same as the time needed to push-pulljust one bank.

In some applications, the insertions and deletions are always performedat the end of the list (eg. invoice numbers, dates etc.). There are nopush-pull operations in these cases. Furthermore, in most applications,read and change operations far outnumber the insert and deleteoperations. The fast FIND and slow INSERT, DELETE nature of the PPCAMmatches these applications perfectly.

As shall be seen in a later section, when very high INSERT and DELETEspeeds are required, a Fast Push-Pull scheme can be used to increase thethrough-put of these operations.

2.1.9 POSSIBLE IMPROVEMENTS

Instead of pushing and pulling in one direction, such steps can beperformed in both directions, choosing the direction that needs fewerpush-pull operations. This will decrease the push-pull time when one ortwo banks of memory are being used. If more than two banks are cascadedtogether to perform parallel push-pull, the push-pull time will alwaysbe equal to the time needed to push-pull a full bank. This feature isprobably not worth implementing if there are more than two banks,because of the extra logic required.

Since all banks can be isolated from each other, it is tempting to add aSE to each bank so that the searches can be done in parallel. This isnot required in most situations as the speed saving will only be O(logN) for the inclusion of the additional hardware.

If the PPCAM is microprocessor based then higher level functions can beadded easily. For example, adding virtual memory functionality ispossible by functionally manipulating the IOI unit; the PPCAMs can beused to implement top levels of a B-tree data structure (known per se inthe art) that is partially stored on disk. In this manner the CAM can beup to giga-bytes in size.

2.1.10 IMPLEMENTATION ISSUES

There are many different ways of implementing PPCAM, from using VLSI toVideo RAM. New concepts like the PPCAM require new designs to realise iteffectively. A sample implementation of a Split Push-Pull PPCAM is givenin the next section.

It is not optimal, but as a prototype it demonstrates PPCAM's mainfeatures.

2.2 PPCAM PROTOTYPE (SPLIT PUSH-PULL)

The first hardware prototype description, with reference to FIG. 11,implements the split push-pull approach of FIGS. 9 & 10.

The OC, SE, IOI and the controller part of the PPM are all implementedwith a single chip microcomputer and some support logic (FIG. 12).

The support logic provides an external comparator for the SE andswitches the address and data buses between the host, the microcomputerand the memory banks. In the prototype high speed counters are used todrive the odd and even address buses.

Note that the data paths are implemented on a single data bus withswitches dividing each bank. There are two sets of switches, with everysecond switch belonging to the same set. By turning the switches on oroff, it is possible to isolate or link specific banks with theirneighbours, or with the microcomputer. In high performance systems, asecond data bus might be required to provide the microcomputer withdirect access to the memory banks, by-passing the switches to preventundesirable propagation delay.

Each PPM memory bank contains two equal size RAM areas linked by acommon data bus. One area is used to store even address data, the otherarea is used to store odd address data. The logic circuit, between thecomputer and the RAM banks, makes the two areas appear as one continuouspiece of memory to the host and SE. In fact, the areas are interleaved.This can be achieved simply by using the least significant bit (LSB) ofthe incoming address to select the correct area and passing theremaining bits to the selected memory area.

In this prototype, a part of each RAM area is reserved for the TransferBuffer of the memory bank they belong to. The Transfer Buffers are usedto hold temporary data during the push-pull operations. Another way isto incorporate the Transfer Buffers in the switches between the banks,rather than taking up space in the RAM areas.

The microcomputer performs a push-pull operation by sending theappropriate signals to the memory banks.

For example, the following table shows the required signals in a push(down) operation involving memory areas with a capacity of four recordseach (the bank can store eight records):

                  TABLE 1                                                         ______________________________________                                        SAMPLE SIGNAL TABLE                                                           EVEN AREA (AO)      ODD AREA (Al)                                             STEPS   Addr     R/W         Addr   R/W                                       ______________________________________                                        1       down     W           11     R                                         2       11       R           11     W                                         3       11       W           10     R                                         4       10       R           10     W                                         5       10       W           01     R                                         6       01       R           01     W                                         7       01       W           00     R                                         a       00       R           00     W                                         9       00       W           up     R                                         ______________________________________                                    

TABLE 1--SAMPLE SIGNAL TABLE

The addresses are in binary (from 00 to 11), the address down is theaddress of the

Transfer Buffer of this memory bank and the address up is the address ofthe

Transfer Buffer of the memory bank above this one.

The push-pull operation normally goes through the following stages:

[Stages 1 and 2 correspond to Steps 1 to 8 of the table in TABLE 1 andstages 3 to 5 correspond to Step 9.]

1 The memory banks are first isolated from each other by turning all theswitches off.

2 Signals similar to those listed out in the table above are thenapplied to all the memory banks at the same time thus facilitating datatransfer within each bank between the two (odd and even) memory areas.

3 One set of switches (i.e. every second switch) is then turned on toallow communications between pairs of banks. Data from the TransferBuffer of one bank is then written into the storage area of the otherbank.

4 That set of switches is then turned off and the other set of switchesis turned on. This allows the banks to be linked with the correspondingother neighbouring bank.

5 Data is then copied from a Transfer Buffer of one bank to the otherbank.

The technique used above is called Overlap Switching and it allows thetime of inter-bank data transfer to become constant (independent of thenumber of banks).

By putting Transfer Buffers in the switches rather than in the RAMareas, stages 3 to 5 listed above could be replaced by a single stage:

data from the Transfer Buffers is copied into a Data Area of the nextbank.

This specific design relies on low-cost RAM and advanced single chipmicro-computers. The price/performance ratio of these two components hasimproved substantially in the past few years (eg. single chip computershaving high clock speed large internal RAM and many peripheral portsonly cost $13 (Australian) at the time this specification was drafted).

The overall aim of this prototype is to move all the relatively complexless frequent operations into software and leave the repetitive, simpleoperations to hardware. This makes a very versatile PPCAM as it is onlythen necessary to change the software to adjust characteristics of thePPCAM.

The total cost will be low since no special hardware is required. Allthe parts are available in large quantities as off-the-shelf components.The bus structure is simple and the PPCAM can be integrated easily withexisting architecture through memory mapping.

This prototype by itself should compete quite strongly with existinghardware and software CAMs. It still relies on some software for itsoperation but the real power of the PPCAM comes from its simple designcausing this pure hardware implementation to be very cost-effective. Inmass production, the PPCAM and the single chip computer and itsassociated software can easily be replaced with hardware logic usingeither custom or off-the-shelf ICs. This further lowers PPCAM s cost andincreases its speed.

2.2.1 PUSH-PULL DISABLE

The push-pull stages presented above only apply to parts of the PPM.When the INSERTs and DELETEs are being performed at the middle of thesorted list it is necessary to disable the push-pull for the data"above" the point of insertion or deletion.

The address where the insertion or deletion is required is termed theUpdate Address. During a parallel push-pull, three types of memory banksappear:

Type 1: The bank containing the Update Address (where the insertion anddeletion is going to occur). Part of the data in the bank is required tobe push-pulled.

Type 2: The banks "above" (those with addresses larger than) the Type 1bank. No push-pull on data is required.

Type 3: The banks "below" (those with addresses smaller than) the Type 1bank. Full push-pull of all data within the banks is required.

For Type 3 banks, it is necessary to apply signals similar to the onesgiven in the signal table above to the banks in order to perform a fullpush-pull of all the data in the bank.

Since the push-pull addresses are going to appear on the dual addressbuses, a simple decode circuit or ROM can be used to disable the Type 2banks by using the first few bits of the Update Address.

For the Type 1 bank, the bank is either enabled (for push downoperation), or disabled (for pull up operation), at the start of thepush-pull. When the push-pull reaches the Update Address, the bank isthen disabled (for push down operation), or enabled (for pull upoperation). This will ensure that the push-pull operation will onlyaffect the relevant part of the bank.

In order to facilitate the partial push-pull of data within the Type 1bank a counter can be used to keep track of when to disable or enablethe bank.

2.3 PPCAM PROTOTYPE (SEGMENT PUSH-PULL)

The prototype given is designed for Split Push-Pull. For SegmentPush-Pull, it is necessary to provide a connection between the last bankand the first bank since data is moving across the bank with eachpush-pull in most cases.

Using the previous circuit (FIG. 12) the data path is looped back fromthe bottom to the top and a switch is placed between the two banks.Conceptually, such an arrangement is similar to a circular ring of bankswith data being transferred from one bank to the other.

If the Transfer Buffers are incorporated in the switches, then theinter-bank transfer becomes very fast since all the transfers occur inparallel.

Overlap Switching can also be used in this case. Although the transferspeed will be slightly slower, it doesn't require the buffers to beincorporated into the switches.

2.4 DYNAMIC RAMs AND MULTIPLE ADDRESS RANGES

The use of dynamic RAM in the PPM is not a penalty as the nature of thePPCAM forces most of the RAM to be accessed each time there is an INSERTor DELETE operation. The memory banks above the delete and insertion canbe refreshed concurrently while the RAM below the operation is beingpushed or pulled. Since all banks share the same address lines, it isnecessary only to disable the writes for those banks.

When there are only FIND operations, all the RAM has to be refreshedexplicitly. This is still acceptable because, since all the banks can beisolated and share the same address lines, the refresh procedure can beperformed in parallel.

Current dynamic RAMs are getting larger in word depth but not in wordsize. The PPCAM on the other hand requires wide words and short RAMdepth for fast operation (the wider the word size the more data can bemoved with each push-pull and the shorter the RAM depth the less datathere is to push-pull per bank). A solution is to use Multiple AddressRanges (MAR), which assigns different ranges of addresses within a RAMchip (or a bank) to different applications (FIG. 13).

This is similar to many small logical memory banks within each bigphysical memory bank, each small memory bank being used by a differentapplication. Rather than using up the whole memory bank or chip, eachapplication uses a fixed address range in the bank. When that addressrange is used up in one bank, the same address range in the next bankwill be used.

Allowing different applications to share the same RAM chip (or bank) cannot only save memory space but also allows dynamic adjustment of thedepth of the memory banks for different applications.

Since the speed of the PPM is inversely proportional to the depth of thememory banks, this will enable tuning of the performance of the PPCAMfor individual applications. In FIG. 13, the address range used byapplication `a` is smaller than the range used by application `b`, thusapplication `a` will have better memory performances than `b`.

All that is needed is a table that lists application identifiers andtheir corresponding address ranges in RAM. This table can itself bestored in the same RAM chip (or bank). The push-pulls and searchessimply use this table to limit their range. MAR can easily beimplemented in the PPCAM prototype using software.

Another feature of MAR is that, since the PPCAM can be addressed bylocation if there is not enough content-addressable data to use up allthe RAM in the PPM, the remaining RAM ranges can be used to store normaldata. No RAM is wasted as the PPM can be used as part of the host'snormal memory.

2.5 PARALLEL GLOBAL OPERATIONS

In some applications, different parts of the application data need to beoperated upon with the same operator. Since the PPCAM allows multiplebanks to be addressed simultaneously (as long as the data is properlyaligned within each bank), multiple locations can be read or written atthe same time. This allows fast bulk manipulation of data items.

Note that the same data can be stored in a lot of different locationssimultaneously with each write. For example, if all the words in the PPMneed to be reset to 0, and there are 1000 words, it will only take thesame time as resetting 100 words if we have 10 banks in the PPM, sincethe banks get reset at the same time.

Data can also be scanned in bulk at a fast rate. In this case the bushas to be modified so multiple signals can be put on the same bus at thesame time (e.g. by the use of open collector drivers). One use will bein the case where we want to select a record out of the PPM by the lowor `O` value of a few bits within the record. If there are 10 banks inthe PPM, then all 10 banks can be accessed together and the specificbits on the bus can be monitored. When the scanned bits match therequired values, then the required record is in one of the 10 recordsaccessed.

By adding some AND and OR circuits to the banks and using the transferbuffers we can perform high level operations on parts of the PPM wordsin parallel. The PPCAM becomes a simple single instruction multiple data(SIMD) machine that can perform a lot of different data operations (eg.scanning data strings), thereby off-loading even more work from the CPU.

2.6 FAST PUSH-PULL

The major weakness of the PPCAM is its relatively slow INSERT and DELETEspeed compared to the FIND operations. When data updates happen at veryhigh speed, the PPCAM might not have enough time to perform push-pulloperations. The following methods can be used to handle sudden bursts ofINSERT and DELETE operations.

For INSERT, the data can be sorted initially in a special buffer, andthen subsequently perform the pushes afterwards. In some applications,data might come in and then be followed immediately by a large number ofFIND operations. In such cases both the PPM and the buffer have to besearched. The sequential scan through the buffer will degrade FINDperformance.

A solution is to use MAR and assign a second address range for theincoming data. However unlike the main address range for thatapplication, the second range is much shorter in depth (e.g. 16 wordsdeep rather than 1000 words deep). The data can be accepted much fasterand the searches will only be slightly slower since there will be lessdata to push-pull but both the second address range and the firstaddress range have to be searched.

Since the searches are quite fast (the data is sorted) and applicationstend to use more recent data, if the smaller range is searched firstevery time, the FIND response might actually improve as it is morelikely to find the right data in the smaller range. This is in effect acache for the application data using the same PPM. The smaller range iscaching the larger range.

If the load on the PPCAM decreases, then the data can be merged from thesmaller range back to the bigger range using perhaps the Large Push-Pulloperation.

For the DELETE operation, the incoming requests will not be buffered.The FIND operation is used to locate the data item to be deleted andthen a data invalid bit in the item will be set or a `deleted` code willbe written into the item. The pull operations can be delayed until thePPCAM work load reduces. The Parallel Global Operations feature can beused to scan for all the deleted items and to actually delete them(permanently) using the pull operation afterwards.

2.7 POTENTIAL APPLICATIONS

The PPCAM can be used in specialized applications like multiple accesscontrol and maintenance of frequently used objects (eg. track and sectorlists for disk driver or active process lists in real time operatingsystems), where the data item size is fixed and the frequency of accessis high.

An example is virtual memory translation. The PPCAM can be used inexisting computer systems with a simple re-write of the virtual memoryhandling routines. The computer manufacturer can not only increase thespeed of the machine's memory system (independent of applications), butalso provide added functionality.

The PPCAM can take a huge load off the operating system allowing the CPUto spend much more time running applications instead of doing processand storage lists management (for example, see H M Dietel, "AnIntroduction to Operating Systems", Addison--Wesley, 1984). A detailedexample on how this could be achieved is given in the next section.

The computer industry is pushing for standards in communications,applications, databases, user-interfaces etc. The PPCAM can provide muchfaster data conversion between different systems and emulation of othersystems. The local value and its corresponding foreign value can form arecord and be stored in the PPCAM using one of them as the key,depending on the direction of the conversion or emulation.

Another area is in communications networks. The PPCAM can be used forencoding and decoding data and also maintenance of network parameterslike station address and status. Furthermore, the conversion andemulation abilities of the PPCAM will speed up operations in networkgateways or communications servers.

More generally, the PPCAM can be used to provide hardware assist todifferent applications, for example in the database management area.Each PPCAM can store one data table and queries of databases can beexecuted much more quickly. Even operations like FIND MAX, MIN, NOT etccan be done in the same amount of time as a normal FIND operation.Partial match, in range test, next record, last record, select, project,join type operations, and integrity controls can also be implementedeasily. When more complex operations are required, the PPCAM can be usedas a building block for more complex data structures.

One side-effect of the PPCAM is that if disordered records are inputinto the PPCAM, and then read out again in sequence, they will be insorted order. Thus PPCAM can also be used as a hardware sorter.

The PPCAM is moving away from typical CAM applications, like memoryaddress mapping, to more high level roles. As more power is added to theOC and SE, the PPCAM can be used as a full function data co-processor.PPCAM's high flexibility and fundamental nature allows it to be used invirtually all situations with large improvement over current techniques.

2.8 DISK CACHE CONTROLLER

One application of the PPCAM is in the implementation of a disk cachecontroller. Caching or buffering is mainly performed by the operatingsystems in most computers. A lot of time is spent on maintaining thecache and implementation of disk-access scheduling algorithms.

A solution to this problem is to use a separate processor (which is theapproach used in most high-performance systems). However, there is adilemma here in choosing the right type of processor, which is needed toperform very basic operations on large data chunks.

The location of sectors (sector address) in disk units normally requiresat least 4 or 5 bytes to represent the drive number, head number, tracknumber and sector number. Such sizes make the job of maintaining thosenumbers very difficult for 8 or 16-bit processors. The use of 32-bitprocessors is an over-kill as a lot of their functions are wasted.

Using the PPCAM will result in a very fast disk-cache controller withlow cost. Not only can data be cached at a much finer level, butsophisticated disk scheduling algorithms can be implemented to minimizehead seek times.

A sector address list can be stored in the PPCAM as a list of records toindicate the sectors that are currently in the cache. The PPM word sizewill be the same as the record size. Sorted automatically by the sectoraddress, the list arranges the physically close sectors together on thelist.

Two extra bits can be added to the record so that the`Not-Used-Recently` replacement scheme can be used to flush data out todisk. The two bits, called Referenced Bit and Modified Bit, are reset toO for a new sector in the cache. They are set to 1 according to whetherthe sector has been referenced or modified. As time goes on four typesof sectors will evolve, according to the bits values:

Type 1--Unreferenced, Unmodified

Type 2--Unreferenced, Modified

Type 3--Referenced, Unmodified

Type 4--Referenced, Modified.

When the cache is full and an old sector needs to be replaced by the newsector it is best to replace type 1 first then type 2, type 3, and type4 last. Note that type 2 seems illogical but it is actually the resultof the periodic resetting of the Referenced Bit. The reason for theperiodic resetting of all Referenced Bits is to maintain the ability todistinguish the most desirable sectors to replace as under heavy usagemost Referenced Bits will be set after a while. The resetting of theReferenced Bits can be performed in parallel using the Global Operationfeature of the PPCAM described earlier. Besides ease of maintenance,better performance tracking and high access speed the sorted sector listhas two additional desirable effects in the flush operations.

2.8.1 FORCED FLUSH

When the host fetches a new sector that is not in cache, the sector hasto be read from the disk. Assuming that the cache is full then one ofthe Sectors in the cache has to be replaced.

Using the SE to locate the position for the new sector in the sectorlist enables a search to be performed from that location to find asector of the right type to be replaced. If the sector to be replacedhas been modified then the sector has to be written out.

The average seek time is reduced since sectors nearer to the disk headwill be tested first in order of replacement. The push-pull time is alsoreduced as only push-pulling of the records between the new and oldsectors in the sector list is needed.

2.8.2 VOLUNTARY FLUSH

When the host is not requesting service from the PPCAM, the PPCAM can gothrough the sector list in order, and write out any sector that has beenmodified. By writing out the modified sector and resetting the ModifiedBits in this way, the number of seeks the disk head has to perform isreduced as the head only moves continuously in one direction until allwrites are finished.

Note that the resetting of both the Referenced and Modified Bits is donein the background so the host data bus will be free for some othertasks. This disk-cache management scheme could also be adopted to beused in virtual memory management systems.

2.9 SUMMARY OF ADVANTAGES OF CAM PREFERRED EMBODIMENTS

While the PPCAM of the preferred embodiments can replace most hardwareCAMs, the other major aim is to use PPCAM to replace existing softwareCAMs. Its flexibility and tunability allow it to be used in generalapplications, from simple table look-up to inferences in artificialintelligence, thus not only improving the price-performancecharacteristic of existing applications, but also allowing applicationspreviously not cost-effective to be implemented. The Front SplitPush-Pull is emphasized in this specification, but by mixing andmatching with other different push-pulls (Back and Segment) andimplementation techniques (Large Push-Pull, MAR, Fast Push-Pull OverlapSwitching, Push-Pull Control Bits), it is possible to tailor the PPCAMfor almost any application.

The PPCAM of the preferred embodiments provides a powerful means ofmanipulation of non-numerical data objects. The PPCAM addresses some ofthe most fundamental areas in computing. For the first time, CAM isavailable at costs that are virtually the same as location addressableconventional memory. The PPCAM is an advance because of its approach tomanipulation of data. It distinguishes itself from most other techniquesin the following ways:

(1) Efficient parallel operation: unlike most other techniques theparallel performance of the PPCAM does not degrade as the number of dataitems increases. The push-pull performance stays linear, while thesearch performance is actually better than linear.

(2) Wide data path: the PPCAM s simple design allows increase in memoryword size, to improve performance, with little corresponding increase inlogic (or cost). Most CAMs, whether software or hardware, degrade inperformance very quickly as the data item sloe increases. By having awide data path the PPCAM is much less sensitive to this degradation.

(3) Avoidance of memory hierarchy overhead: the PPCAM operates on memorydirectly. There is no need to move data from memory to cache to registeror vice versa.

(4) Seamless interface to current architecture: the PPCAM maps directlyinto the computer's normal main memory and can be addressed normallyusing data item location or by the contents of the location. Also, thedata is sorted and linear, and thus can be manipulated by the computereasily.

(5) Overlapping of instruction execution: the slower PPCAM operations,like INSERT and DELETE, can be overlapped with normal host processing;thus hiding their slowness.

(6) Off-loading of the most time consuming operations in computing: byoff-loading searching, sorting and other bulk data operations; verylarge gains in application execution can be achieved.

(7) Flexible performance tunning: PPCAM's dynamic tunability allows itto be used in a lot of different applications. The decoupling of logicfrom memory cells also allows new functionality to be added much moreeasily.

(8) Handles variable data lengths: PPCAM's memory modules could behorizontally cascaded together with no increase in complexity and itsmemory words could be combined to store larger items. This makesmanipulation of variable length data very simple.

(9) Multiple responses resolution: Since the data is sorted, recordswith the same contents are stored next to each other, multiple matches(responses to FIND) could be handled readily.

(10) Based on conventional technology: The PPCAM can be built withoff-the-shelf components or existing VLSI techniques using conventionalRAM cells. Due to the lower manufacturing cost of these devices, largecapacity PPCAM could be realized cheaply. This large capacity is notjust useful in handling large data items, but also in improving thespeed of operation, since the loading and re-loading of data fordifferent applications is reduced.

(11) Multiple Addressing Modes: With sorted data, powerful queries couldbe made on the data with little overhead. Besides addressing using exactmatch and by location; partial match, greater than, less than, notequal, maximum, minimum etc could also be used to address the data.

Some of the benefits provided by the PPCAM includes:

(1) On the software side, the PPCAM's ability to replace most currentsoftware data structures will result in:

faster program execution

smaller program size

easier performance tuning

lower software complexity

quicker application development

reduced software maintenance costs

more portable software.

(2) On the hardware side the PPCAM shifts a lot of work from the CPU toactive memory. This results in:

reduced hardware complexity

more efficient use of memory

higher functionality

simpler implementation

easier integration.

I claim:
 1. A memory structure for storing records, said structurecomprising a plurality of contiguous memory locations each for storingone of said records, said plurality of memory locations being dividedinto memory sub-structures, each of said memory sub-structurescomprising a separate but contiguous sub-portion of said memorystructure, each of said sub-structures including a buffer memorylocation attached thereto, each of said buffer memory locations beingarranged to receive a record stored in a memory location within thecorresponding sub-structure and to transfer a record stored in thatbuffer memory location to a memory location within the correspondingsub-structure, wherein each of said buffer memory locations is furtherarranged to;receive a record stored in one of said memory locations ofan immediately adjacent one of said sub-structures; and to transfer arecord stored in that buffer memory location to one of said memorylocations of an immediately adjacent one of said sub-structures.
 2. Thememory structure of claim 1, wherein each of said records includes asearch key comprising at least a part of each record and said recordsare stored in search key order.
 3. The memory structure of claim 1,wherein each of said sub-structures includes an equal number of memorylocations.
 4. The memory structure of claim 1, wherein each of saidbuffer memory locations is the same size as each one of said memorylocations.
 5. The memory structure of claim 1, wherein each buffermemory location has the capacity to hold more than one record held insaid memory locations.
 6. The memory structure of claim 1, wherein saidmemory structure performs as a stack, and whereby a record is added tothe stack at a chosen one of said memory locations within said memorystructure by shuffling all records at and above said chosen memorylocation up one memory location (UP SHUFFLE OPERATION).
 7. The memorystructure of claim 1, wherein said records in said memory structure aremaintained in sorted order.
 8. The memory structure of claim 1 furthercomprising an overlap switching configuration for allowing data to betransferred between said sub-structures concurrently through a two-stageprocess, said overlap switching configuration comprising switchesbetween each sub-structure, said switches being divided into two sets inwhich every second switch belongs to the same set, said data transferbeing effected by turning said sets on and off so as to link andrespectively isolate each sub-structure with neighboring ones of saidsub-structures thereby allowing concurrent data transfer betweensub-structures.
 9. The memory structure of claim 1 further comprisingcontrol bits appended to the record in each occupied memory location,said control bits being maskable and allowing marking of specific onesof said records for use by different applications including identifyingidentical search keys of different records.
 10. The memory structure ofclaim 1, wherein the memory structure performs as a stack, and whereby arecord is added to the stack at a chosen memory location within saidstructure by shuffling all records at and below said chosen memorylocation down one memory location (DOWN SHUFFLE OPERATION) in the memorystructure.
 11. The memory structure of claim 1, wherein the memorystructure performs as a stack, and whereby a record is added to thestack at a chosen memory location within said structure by shuffling allrecords sideways in raster format when said sub-structures aretransposed.
 12. The memory structure of claim 1, wherein the memorystructure performs as a stack, and whereby a record is deleted from saidstack by a logically opposite overwrite process.
 13. A method of storingrecords in a memory structure having a plurality of contiguous memorylocations, each for storing one of said records, said plurality ofmemory locations being divided into memory sub-structures, each of saidmemory sub-structures comprising a separate but contiguous sub-portionof said memory structure, each of said sub-structures including a buffermemory location attached thereto, each of said buffer memory locationsbeing arranged to receive a record stored in a memory location withinthe corresponding sub-structure and to transfer a record stored in thatbuffer memory location to a memory location within the correspondingsub-structure, wherein each of said buffer memory locations is furtherarranged to;receive a record stored in one of said memory locations ofan immediately adjacent one of said sub-structures; and to transfer arecord stored in that buffer memory location to one of said memorylocations of an immediately adjacent one of said sub-structures; saidmethod comprising the steps of storing said records in said memorystructure in search key order, and placing said records into contiguousmemory locations in said structure ordered by search key.
 14. The methodof claim 13, wherein the memory structure performs as a stack, andwhereby a record is deleted from said stack by a logically oppositeoverwrite process.
 15. The method of claim 13, wherein said memorystructure performs as a stack, and said method further comprises thestep of adding a record to the stack at a chosen one of said memorylocation within said memory structure by shuffling all records at andabove said chosen one of said memory locations up one memory location(UP SHUFFLE OPERATION).
 16. The method of claim 15 wherein an UP SHUFFLEOPERATION is performed by dividing said stack into smaller sub-stacks asfollows:A. all memory locations in the sub-structure containing saidchosen one of said memory locations above said chosen one of said memorylocations are treated as a sub-stack, B. all sub-structures above thesub-structure containing said chosen one of said memory locations aretreated as sub-stacks, C. a record is popped off the top of each of thestacks and is respectively stored in the buffer memory location attachedto the sub-structure immediately above the sub-structure from which thatrecord has been popped, D. all of the sub-stacks are pushed up by onememory location, E. to complete the UP SHUFFLE OPERATION, each recordnow stored in each buffer memory location as a result of the UP SHUFFLEOPERATION so far is transferred to the bottom memory location in thesub-structure to which that buffer memory location is attached, and F.at the same time as or subsequent to step E, the record to be added tosaid stack is transferred into said chosen one of said memory locations.17. The method of claim 15 wherein said method further comprises thestep of overwriting a chosen location by shifting the records of allmemory locations below that chosen memory location up one location. 18.The method of claim 13, wherein the memory structure performs as astack, and said method further comprises the step of adding a record tothe stack at a chosen one of said memory locations within said memorystructure by shuffling all records at and below said chosen one of saidmemory locations down one memory location (DOWN SHUFFLE OPERATION) inthe memory structure.
 19. The method of claim 18 wherein a DOWN SHUFFLEOPERATION is performed by dividing said stack into smaller sub-stacks asfollows:A. all memory locations in the sub-structure containing saidchosen one of said memory locations below said chosen one of said memorylocations are treated as a sub-stack, B. all sub-structures below thesub-structure containing said chosen one of said memory locations aretreated as sub-stacks, C. a record is pushed off the bottom of each ofthe sub-stacks and is respectively stored in the buffer memory locationattached to the sub-structure immediately below the sub-structure fromwhich that record has fallen, D. all of the sub-stacks are pushed downby one memory location, E. to complete the DOWN SHUFFLE OPERATION, eachrecord now stored in each buffer memory location as a result of the DOWNSHUFFLE OPERATION so far is transferred to the top memory location inthe sub-structure to which that buffer memory location is attached, andF. at the same time as or subsequent to step E, the record to be addedto said stack is transferred into said chosen one of said memorylocations.
 20. The method of claim 18 wherein said method furthercomprises the step of overwriting a chosen memory location by shiftingthe records of all memory locations above that chosen memory locationdown one location.
 21. The method of claim 13, wherein the memorystructure performs as a stack, and said method further comprises thestep of adding a record to the stack at a chosen memory location withinsaid structure by shuffling all records sideways in raster format whensaid sub-structures are transposed.
 22. The method of claim 21, whereina raster type shuffle is performed as follows:A. all sub-structures arein a side by side format, B. for all memory locations below said chosenone of said memory locations at which an insertion is desired, C. eachrecord in the lowermost row of memory locations in all sub-structures isshifted into the corresponding buffer memory location, D. each record ineach buffer memory location is shifted into the memory location of theadjacent sub-structure corresponding to the memory location from whichthat record has just been removed, E. steps C and D are repeated for thenext row up of memory locations in all sub-structures until the recordin said chosen one of said memory locations is moved, and F. the recordto be added to the stack is inserted in said chosen one of said memorylocations.
 23. The method of claim 21, wherein a raster type shuffle isperformed as follows:A. all sub-structures are in a side by side format,B. for all memory locations above said chosen one of said memorylocations at which an insertion is desired, C. each record in theuppermost row of memory locations in all sub-structures is shifted intothe corresponding buffer memory location, D. each record in each buffermemory location is shifted into the memory location of the adjacentsub-structure corresponding to the memory location from which thatrecord has just been removed, E. steps C and D are repeated for the nextrow down of memory locations in all sub-structures until the record insaid chosen one of said memory locations is moved, and F. the record tobe added to the stack is inserted in said chosen one of said memorylocations.